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Abstract 

We  extend  the  numerical  methods  of  [10],  known  as  the  Markov  chain 
approximation  methods,  to  controlled  general  nonlinear  delayed  reflected 
diffusion  models.  The  path  and  the  control  can  both  be  delayed.  For  the 
no-delay  case,  the  method  covers  virtually  all  models  of  current  interest. 

The  method  is  robust,  the  approximations  have  physical  interpretations 
as  control  problems  closely  related  to  the  original  one,  and  there  are  many 
effective  methods  for  getting  the  approximations,  and  for  solving  the  Bell¬ 
man  equation  for  low-dimensional  problems.  These  advantages  carry  over 
to  the  delay  problem.  It  is  shown  how  to  adapt  the  methods  for  get¬ 
ting  the  approximations,  and  the  convergence  proofs  are  outlined  for  the 
discounted  cost  function.  Extensions  to  all  of  the  cost  functions  of  cur¬ 
rent  interest  as  well  as  to  models  with  Poisson  jump  terms  are  possible. 

The  paper  is  particularly  concerned  with  representations  of  the  state  and 
algorithms  that  minimize  the  memory  requirements. 

Key  words:  Optimal  stochastic  control,  numerical  methods,  delay  stochastic  equa¬ 
tions,  numerical  methods  for  delayed  controlled  diffusions,  Markov  chain  approxima¬ 
tion  method. 


Introduction 

The  aim  of  this  paper  is  to  extend  the  numerical  methods  of  [10],  known  as  the 
Markov  chain  approximation  methods,  to  controlled  delayed  diffusion  models. 
We  work  with  a  general  reflected  controlled  diffusion  model  with  delays.  The 
basic  idea  is  the  approximation  of  the  control  problem  by  a  control  problem 
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with  a  suitable  approximating  Markov  chain  model,  solve  the  Bellman  equation 
for  the  approximation,  and  then  prove  the  convergence  of  the  optimal  costs  to 
that  for  the  original  problem  as  the  approximation  parameter  goes  to  zero.  The 
method  is  robust,  the  approximations  have  physical  interpretations  as  control 
problems  closely  related  to  the  original  one,  and  there  are  many  effective  meth¬ 
ods  for  getting  the  approximations,  and  for  solving  the  Bellman  equation  if  the 
dimension  is  not  too  large.  For  example,  for  the  no-delay  case,  the  code  [5,  11] 
exploits  multigrid,  accelerated  Gauss-Seidel,  and  approximation  in  policy  space 
methods. 

The  applications  in  [10]  cover  virtually  all  of  the  usual  (non-delay)  models 
and  cost  functions,  including  jump-diffusions,  controlled  variance  and  jumps, 
singular  controls,  etc.  But  to  focus  on  the  issues  that  are  different  for  the 
delay  case  and  not  excessively  repeat  the  development  in  the  reference,  we  will 
consider  only  the  diffusion  case  without  variance  control.  The  extensions  to  the 
more  general  cases  cited  above  adapt  the  methods  in  the  reference  similarly  to 
what  is  done  for  the  simpler  case  treated  here. 

Models  for  many  physical  problems  have  reflecting  boundaries.  They  occur 
naturally  in  models  arising  in  queueing/communications  systems  [8],  where  the 
state  space  is  often  bounded  owing  to  the  finiteness  of  buffers  and  nonnegativity 
of  their  content,  and  the  internal  routing  determines  the  reflection  directions  on 
the  boundary.  Numerical  analysis  for  state-space  problems  is  usually  done  in 
a  bounded  region.  A  standard  way  of  bounding  a  state  space  is  to  impose  a 
reflecting  boundary,  if  one  does  not  exist  already.  One  selects  the  region  so  that 
the  boundary  plays  a  minor  role.  Alternatively,  one  can  stop  the  process  on  exit 
from  a  bounded  given  set.  In  order  to  focus  on  the  concepts  that  are  new  and 
essential  in  the  delay  case,  most  of  the  development  will  be  for  one-dimensional 
problems.  As  in  [10]  dimensionality  is  an  issue  only  in  that  the  computational 
time  increases  very  rapidly  as  the  dimension  increases.  Some  comments  on 
higher  dimensional  models  are  given  at  the  end  of  Section  5. 

Reference  [7]  has  a  brief  discussion  showing  how  the  Markov  chain  approx¬ 
imation  applies  to  uncontrolled  delay  equations.  Only  the  simplest  procedure 
was  discussed,  and  there  was  no  concern  with  the  control  problem,  or  with 
computationally  efficient  representations  of  the  state.  There  is  an  extensive  lit¬ 
erature  on  the  delayed  linear  system,  quadratic  cost,  and  white  Gaussian  noise 
case  [2,  4,  9,  12,  15].  As  for  the  no-delay  case,  this  is  essentially  the  same  whether 
there  is  a  driving  noise  or  not.  The  problem  reduces  to  the  study  of  an  abstract 
Ricatti  equation.  The  paper  [15]  develops  a  “dual  variable”  approach  to  the 
problem  where  the  control  and  the  path  variables  are  delayed.  The  develop¬ 
ment  depends  heavily  on  the  linear  structure  and  as  yet  has  not  been  extended 
to  the  general  nonlinear  stochastic  reflected  diffusion  problem. 

The  next  section  introduces  the  model  and  assumptions.  In  much  of  the 
development,  for  pedagogical  purposes  we  divide  the  discussion  into  a  part  where 
only  the  path  is  delayed  in  the  dynamics  and  a  part  where  both  the  control  and 
path  are  delayed.  The  cost  function  is  confined  to  the  discounted  case.  The 
existence  of  an  optimal  control  is  proved  in  Section  2.  This  is  of  interest  for 
its  own  sake,  but  is  also  an  introduction  to  the  weak  convergence  methods  used 
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for  the  proof  of  convergence  of  the  numerical  procedure  as  the  approximation 
parameter  goes  to  zero. 

The  basic  ideas  and  proofs  of  the  Markov  chain  approximation  method  are 
extensions  of  those  for  the  no-delay  case  in  [10],  yet  are  not  obvious.  Due  to  this 
a  review  of  the  no-delay  case  is  useful  to  set  the  stage  and  refresh  familiarity 
with  the  basic  concepts.  This  is  done  in  Section  3.  The  fundamental  assumption 
required  for  the  convergence  of  the  numerical  procedure  is  the  so-called  “local 
consistency  condition”  [10].  This  says  no  more  than  that  the  conditional  mean 
change  (resp,  variance)  in  the  state  of  the  approximating  chain  is  proportional 
to  the  drift  (resp,  covariance),  modulo  small  errors.  This  would  seem  to  be 
a  minimal  condition.  In  general,  it  need  not  hold  everywhere  (see.  e.g.,  [10, 
Section  5.5]).  To  help  focus  ideas  in  the  later  discussion  of  the  delay  system,  a 
simple  example  of  construction  of  the  chain  is  given  and  various  related  matters 
discussed.  There  are  two  types  of  construction  that  are  of  interest,  called  the 
“explicit”  and  “implicit”  methods,  owing  to  the  similarity  of  one  particular 
way  of  constructing  them  to  methods  of  the  same  names  for  solving  parabolic 
PDE’s.  Each  has  an  interesting  role  to  play  for  the  delay  model.  In  Section 
4,  we  extend  the  concepts  of  Section  3  to  the  delay  case.  The  notion  of  local 
consistency  is  still  fundamental.  The  approximating  chains  are  constructed 
almost  exactly  as  they  are  for  the  no-delay  case.  The  proofs  of  convergence  in 
[10]  are  purely  probabilistic,  being  based  on  weak  convergence  methods.  The 
idea  is  to  interpolate  the  chain  to  a  continuous-time  process  in  a  suitable  manner, 
show  that  the  Bellman  equation  for  the  interpolation  is  the  same  as  for  the  chain, 
and  then  show  that  the  interpolated  processes  converge  to  an  optimal  diffusion 
as  the  approximating  parameter  goes  to  zero.  The  interpolations  of  interest  are 
introduced  and  the  convergence  theorems  are  stated  in  Section  4.  We  try  to 
bring  the  delay  case  into  a  form  where  the  results  of  [10]  can  be  appealed  to  for 
the  completion  of  the  algorithm  or  proof.  The  proof  of  convergence  is  in  Section 
8,  where  mainly  the  parts  of  the  convergence  proof  that  are  different  for  the 
delay  case  are  given.  We  have  tried  to  present  the  minimal  details  that  yield  a 
coherent  picture  of  the  convergence  proofs. 

The  state  of  the  problem,  as  needed  for  the  numerical  procedure,  consists 
of  a  segment  of  the  path  (over  the  delay  interval)  and  of  the  control  path  as 
well  (if  the  control  is  also  delayed).  This  can  consume  a  lot  of  memory.  Sec¬ 
tion  5  is  concerned  with  efficient  representations  of  the  state  data  for  chains 
constructed  by  the  “explicit”  method.  Sections  6  and  7  are  concerned  with  the 
“implicit” method,”  which  can  be  advantageous  as  far  as  memory  requirements 
are  concerned.  In  these  sections,  attention  is  confined  to  to  the  case  where  only 
the  path  is  delayed.  If  the  control  is  also  delayed,  then  the  problem  is  more 
complicated  and  reasonably  efficient  representations  are  not  yet  available. 


3 


1  The  Model 


The  model.  The  maximum  delay  is  denoted  by  r  >  0.  The  solution  path  is 
denoted  by  x(-).  Let  U  denote  the  compact  control-value  space.  The  controls 
■u(-)  are  Zi-valued,  measurable  and  nonanticipative  with  respect  to  the  driving 
Wiener  process  (this  defines  the  set  of  admissible  controls).  x{t,9)  is  used  to 
denote  the  path  segment  x{t  +  6),9  G  [— t,  0],  r  >  0,  and  we  write  x{t)  =  x{t,-). 
The  function  u{t)  is  defined  analogously  from  u(-).  The  solution  x(t)  is  confined 
to  a  finite  interval  G  =  [0,  B]  by  reflection  The  reflection  process  is  denoted 
by  z{-).  Its  purpose  is  to  assure  that  x{t)  does  not  escape  from  the  interval 
G,  and  is  the  minimum  “force”  necessary.  For  more  information  on  reflected 
(non-delay)  SDE’s  see  [10]  or  [8]. 

Delay  in  the  path  only.  First  consider  systems  where  the  control  is  not  de¬ 
layed  and  we  use  the  reflected  controlled  delayed  stochastic  differential  equation 
model,  where  w{-)  is  a  standard  Wiener  processd 

dx{t)  =  b{x{t),u{t))dt  +  a{x{t))dw{t)  +  dz{t),  x(0)  given.  (1.1) 

We  can  write  z{t)  =  yi{t)  —  y2(t),  where  j/i(-)  (resp.  j/2('))  is  a  continuous  and 
nondecreasing  processes  that  can  increase  only  at  t  where  x(t)  =  0  (resp.,  when 
x{t)  =  B).  Except  for  the  LQG  problem  (without  a  refiection  term)  (see.,  e.g., 
[2]),  little  is  known  about  the  control  of  such  systems.  An  example  of  (1.1)  is 

dx{t)  =  bi{x{t)  ,u{t))dt  +  b2{x{t  —  T),u{t))dt  +  a{x{t))dw{t)  +  dz{t). 

For  a  set  S  in  some  metric  space  and  ti  <  t2,  let  D[S;ti,t2]  denote  the 
set  of  S'- valued  functions  that  are  right  continuous  on  [ti,t2),  have  left-hand 
limits  on  {ti,t2],  and  with  the  Skorohod  topology  [1,  3].  If  S  is  the  set  of  real 
numbers,  then  we  write  just  D[ti,t2],  or  D[ti,  oo)  if  <2  =  00.  Since  6(-)  depends 
on  the  “segment”  of  the  a;(-)-process  over  an  interval  of  length  t,  its  domain  is  a 
function  space  and  we  need  to  define  the  space  of  such  segments.  In  work  on  the 
mathematics  of  delay  equations  it  is  common  to  use  the  path  spaces  C)— r,  0] 
(with  the  sup  norm  topology)  or  L2[— r,  0].  Any  of  these  could  be  used  here. 
But  the  Skorohod  space  D[— r,  0]  is  more  appropriate  for  the  approximation 
and  weak  convergence  analysis  of  concern  and  involves  no  loss  of  generality.  If 
the  model  is  extended  to  include  a  Poisson-type  jump  term,  then  the  use  of 
D[—t,0]  is  indispensable.  Note  that  if  /"(•)  converges  to  /(•)  in  D[ti,t2]  and 
/(•)  is  continuous,  then  the  convergence  is  uniform  on  any  finite  interval. 

We  will  use  x,y  (or  x{-),y{-))  to  denote  the  canonical  point  in  D[G-,  — r, 0]. 
A  shortcoming  of  the  Skorohod  topology  is  that  the  function  defined  by  f{x)  = 
x{t),  for  any  fixed  t  G  [— r,  0],  is  not  continuous  (it  is  measurable).  But  it  is 

^By  adapting  the  techniques  in  [10],  a  driving  Poisson  jump  process  and  controlled  variance 
can  also  be  treated,  as  can  singular  controls,  boundary  absorption  and  optimal  stopping.  But 
here  we  aim  to  concentrate  on  the  issues  arising  due  to  the  delay  without  overly  complicating 
the  development. 
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continuous  at  all  points  x{-)  that  are  continuous  functions.  In  our  case,  all  the 
solution  paths  x{-)  will  be  continuous.  The  following  assumption  covers  the 
common  case  where  b{x{t),a)  =  hi{x{t  —  Ti) ,  a)  ,Q  <  Ti  <  t  ,  where  the  bi{-) 
are  continuous. 

Al.l.  b{-)  is  bounded  and  measurable  and  is  continuous  on  D[G]  — r,  0]  x  at 
each  point  (a;(-),Q;)  such  that  x{-)  is  continuous.  The  function  cr(-)  is  bounded 
and  measurable  and  is  continuous  on  D[G;— r,  0]  at  each  point  x(-)  that  is 
continuous. 


Relaxed  controls.  For  purposes  of  proving  approximation  and  limit  theo¬ 
rems,  it  is  usual  and  very  convenient  to  work  in  terms  of  relaxed  controls. 
Recall  the  definition  of  a  relaxed  control  m(-)  [10].  It  is  a  measure  on  the 
Borel  sets  of  G  x  [0,oo),  with  m(A  x  [0,-])  being  measurable  and  nonantic- 
ipative  with  respect  to  w(-)  for  each  Borel  A  €  G,  and  satisfying  m(G  x 
[0,t])  =  t.  Write  m(A,t)  =  m(A  x  [0,t]).  The  left-hand  derivative^  m'(da,t)  = 
limo<£^o[w((ia,  t)  —  m(da,  t  —  S)]/6  is  defined  for  almost  all  (w,  t).  By  the  def¬ 
initions,  m{dads)  =  m'{da,  s)ds.  For  0  <  w  <  r,  we  write  m{da,ds  —  v)  for 
m{da,s  —  v)  —  m{da,s  —  ds  —  v).  The  weak  topology  is  used  on  the  relaxed 
controls.  Thus  m"(-)  converges  to  to(-)  if  and  only  if  f  f  </)(a,  s)m"(da  ds) 
f  f  (/>(a,  s)m(da  ds)  for  all  continuous  functions  (/>(■)  with  compact  support. 
With  this  topology,  the  space  of  relaxed  controls  is  compact.  An  ordinary 
control  u(-)  can  be  written  as  the  relaxed  control  m(-)  defined  by  its  derivative 
m'{A,t)  =  I{u{t)eA}i  where  Ik  is  the  indicator  function  of  the  set  K.  Then 
m{A,  t)  is  the  amount  of  time  that  the  control  takes  values  in  the  set  A  by  time 
t. 

Rewriting  (1.1)  in  terms  of  relaxed  controls  yields 


a;(t)  =  a;(0) -I-  /  /  b{x{s),a)m{dads)  +  /  cr(a;(s))(ir(;(s) -I- z(t) 

Jo^  JU  Jo 

ft 


=  a;(0) -I-  f  f  b{x{s),a)m'{da,s)ds  +  f  a{x{s))dw{s)  +  z{t). 

Jo  Ju  Jo 


(1.2) 


Delays  in  the  control.  We  will  also  consider  the  problem  where  the  control  as 
well  as  the  path  is  delayed.  Let  BJA]  — r,  0]  be  the  space  of  measurable  functions 
on  [— T,  0]  with  values  in  U,  and  let  u{-)  denote  a  canonical  element  of  B\IA]  — r,  0]. 
Then  the  dynamical  term  b{-)  becomes  a  function  of  both  x,u.  Depending  on 
the  applications  of  interest,  there  are  a  variety  of  choices  for  the  way  that  the 
control  appears  in  b{-).  We  will  use  the  following  quite  general  assumption, 
where  u{t)  denotes  the  function  u{t  +  9),  9  G  [— r,  0]. 

A1.2.  Let  ii{-)  be  a  bounded  measure  on  the  Borel  sets  of  [— r,  0]  and  let  b{-)  be 
a  bounded  measurable  function  on  D[G-,  — r,  0]xUx  [— r,  0].  For  each  v  G  [— r,  0], 

^In  [10]  mt  was  used  to  denote  the  derivative.  But  this  notation  would  be  confusing  in  the 
context  of  the  notation  required  to  represent  the  various  delays  in  this  paper. 
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b{x,a,v)  is  continuous  in  {x,a)  at  each  point  x  that  is  a  continuous  function. 
For  ordinary  controls  u{-),  the  drift  term  at  time  t  is  assumed  to  have  the  form 
(replacing  b{x{t),u{t))  in  (1.1)) 

b{x{t),u{t))  =  J  b{x{t),u{t  + v),v)n{dv). 


For  a  relaxed  control  to(-),  the  integral  of  the  drift  term  then  has  the  form 
/  /  /  b{x{s),a,v)m'{da,s  +  v)iJ,{dv)ds 

Jo  J-T  Ju 

rO 


nb{x{s),  a,  v)m{da,  ds  +  v) 
I 


(1.3) 


n{dv). 


An  example  of  the  general  form  covered  by  (A1.2)  is,  for  0  <  r*  <  r, 

dx{t)  =  x{t)x{t  —  Ti)u{t  —  T2)dt  +  w^(t  —  To)dt  +  bo{x{t))dt  +  a{x{t))dw  +  dz{t), 


in  which  case  /r(-)  is  concentrated  on  the  two  points  {—T2,  — T3}.  What  is  not 
covered  are  “cross”  terms  in  the  control  such  as  u{t  —  Ti)u{t  —  T2),  where  ti  yf  T2. 
The  full  system  equation  is 


x{t)  =  x{0)+  /  /  b{x{s),a,v)m{da,ds  +  v)  fi{dv)+  a{x{s))dw{s)+z{t), 

J-T  L^o  Ju  J  Jo 

(1.4) 

and  the  initial  data  is  (x(0), ■u(O)).  Let  m{t)  denote  the  segment  of  m{-)  on 

[t  -  T,  t]. 


Weak-sense  solutions.  If  w{-)  is  a  Wiener  process  on  [0,oo)  and  m(-)  is 
a  relaxed  control  on  the  same  probability  space  and  it  is  defined  on  either 
[— T,  00)  or  [0,oo),  and  is  nonanticipative  with  respect  to  w{-),  then  we  say 
that  the  pair  is  admissible  or,  if  w{-)  is  understood,  that  to(-)  is  admissible. 
Suppose  that,  given  any  admissible  pair  wi(-), mi(-),  and  a;i(0)  defined  on  the 
same  probability  space,  (mi(-), a;i(0))  is  nonanticipative  with  respect  to  wi(-)) 
and  there  is  a  probability  space  on  which  is  defined  a  set  {x{-),w{-),m{-),  z{-)) 
solving  (1.1)  or  (1.4),  where  (a;(-), m(-), z(-))  is  nonanticipative  with  respect 
to  the  Wiener  process  w{-),  {w{-),m{-),x{0))  has  the  same  probability  law  as 
(wi(-),  TOi(-),xi(0)),  and  the  probability  law  of  the  solution  set  does  not  depend 
on  the  probability  space.  Then  we  say  that  there  is  a  solution  in  the  weak  sense 
[6].  If  the  control  is  delayed,  then  it  will  be  defined  on  the  interval  [— r,  00).  Thus 
the  nonanticipativity  of  (m(-),a;(0))  implies  the  independence  of  (m(0),a;(0)) 
and  w{-).  Such  independence  will  always  hold.  We  always  assume  the  following 
condition. 

A1.3.  There  is  a  weak-sense  unique  weak-sense  solution  to  (1.1)  and  (1.4)  for 
each  admissible  relaxed  control  and  initial  data. 
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The  techniques  that  are  used  to  prove  existence  and  uniqueness  for  the  no-delay 
problem  can  be  adapted  to  the  delay  problem.  For  example,  use  (A1.2)  and 
the  Lipschitz  condition  \b{x,a,v)  —  b{y,a,v)  \  <  iF  sup_^<g<Q  |a;(s)  —  j/(s)|,  and 
a  standard  Picard  iteration.  Alternatively,  one  can  use  the  Girsanov  measure 
transformation  methods  [8,  10].  See  also  [13,  Section  1.7]  for  the  uncontrolled 
problem. 

The  discounted  cost  function.  To  simplify  the  discussion,  throughout  the 
paper  we  focus  on  a  discounted  cost  function.  Let  jd  >  0,  let  c  =  (ci,C2)  be  a 
given  constant,  and  let  denote  the  expectation  under  the  initial  condition 
X  =  x(0),  when  the  relaxed  control  m(-)  is  used.  Then  the  cost  function  for 
(1.1)  is 


VF(a;,TO)  =  if™  f  [  e  [k{x{t),a)m' {daTt)dt  +  d dy{t)\ ,  (1-5) 

Jo  Ju 

V{x)  =  inf  VF(a;,  to)  , 

m 

where  the  inf  is  over  the  admissible  relaxed  controls,  and  fc(-)  is  assumed  to 
satisfy  the  conditions  on  5(-)  in  (1.1). 

For  a  relaxed  control  to(-),  let  to(0)  denote  the  segment  to(-,s),s  G  [— t, 0]. 
Write  TO  for  the  canonical  value  of  ?7i(0).  For  (1.4),  the  cost  function  is,  where 
fc(-)  is  assumed  to  satisfy  the  conditions  on  6(-)  in  (A1.2), 


poo  p\J  p 

W{x,m)=E'^  /  /  /  [k{x{t),cx.,v)m' {da,t  +  v)lJ'{dv)dt  +  d dy{t)] , 

Jo  J-T Ju 

(1-6) 

V {x,  to)  =  inf  W (x,  to). 


where  the  infimum  is  over  all  relaxed  controls  with  initial  segments  to(0)  =  rhd 
Recall  that,  in  our  notation,  for  v  >  0,  m'{da,t  —  v)dt  =  m{da,  dt  —  v). 


2  Preliminary  Results:  Existence  of  an  Optimal 
Control 

Theorem  2.1  establishes  the  existence  of  an  optimal  relaxed  control.  Since  (1.1) 
is  a  special  case  of  (1.4),  we  work  with  (1.4).  The  proof  of  existence  closely 
follows  the  standard  procedure  for  the  no-delay  problem,  say  that  of  [10,  The¬ 
orem  10.2.1],  and  we  will  only  outline  the  procedure  and  note  the  differences. 

®  Since  we  are  working  with  weak-sense  solutions,  the  Wiener  process  might  not  be  fixed. 
For  example,  if  Girsanov  measure  transformation  methods  are  used,  then  the  Wiener  process 
will  depend  on  the  control.  Then  the  inf  in  (1.5)  or  (1.6)  should  be  over  all  admissible  pairs 
(m(*),  tc(*)),  with  the  given  initial  data.  But  to  simplify  the  notation,  we  write  simply  infm- 
This  is  essentially  a  theoretical  issue.  The  numerical  procedures  give  feedback  controls  and  all 
that  we  need  to  know  is  that  there  is  an  optimal  value  function  to  which  the  approximating 
values  will  converge. 
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Theorem  2.2  says  that  approximations  to  the  controls  give  approximations  to 
the  cost,  and  the  proof  is  nearly  identical  to  that  of  Theorem  2.1.  Theorem  2.3 
asserts  that  the  use  of  relaxed  controls  does  not  affect  the  infimum  of  the  costs. 
The  no-delay  form  is  [10,  Theorem  10.1.2]  and  the  proof  is  omitted  since  the 
adjustments  for  the  present  case  are  readily  made,  given  the  comments  in  the 
proof  of  Theorem  2.1. 


Theorem  2.1  Assume  (A1.2)~(A1.3).  Then  there  is  an  optimal  control  for  any 
fixed  initial  condition  x,rh,  where  x  is  continuous  on  [— r,  0].  I.e.,  there  is  a 
set  {x{-),w{-),m{-),  z{-))  solving  (1.4),  where  {x{-) ,  m{-) ,  z{-))  is  nonanticipative 
with  respect  to  the  Wiener  process  w{-),  m(0)  =  in,  x(0)  =  x,  and  W{x,m)  = 
V{x,rh). 


Proof.  Let  m"(-)  be  a  minimizing  sequence  of  relaxed  controls,  with  associated 
solutions  x"(-),z"(-),  with  x”(0)  =  x,  Wiener  processes  and  to”(0)  =  in. 

Write  z”(-)  =  y'fi-)  -  y^{-).  Thus 


a:"(<)  =  a;(0)-|- 
/•o 


/  / 

r  IJO  JU 


x"(s),  a,  v)m"(da,  ds  +  v) 


p,{dv)  +  /  (T(a;"(s))dw"(s) -I- z"(t). 

Jo 

(2.1) 

The  sequence  (a;"(-), m”(-),  j/"(-), ■u;"(-))  is  tight  and  all  weak-sense  limit  pro¬ 
cesses  are  continuous,  as  follows.  The  tightness  of  the  (w"(-),  to”(-))  and  the 
continuity  of  their  weak-sense  limits  are  obvious,  as  is  the  Wiener  property 
of  any  weak-sense  limit  of  w"(-).  The  processes  defined  by  the  ordinary  and 
stochastic  integral  terms  of  (2.1)  are  also  tight  and  all  weak-sense  limits  are 
continuous.  The  tightness  and  asymptotic  continuity  of  the  t/"(-)  can  be  proved 
by  contradiction.  If  it  is  false,  then  there  will  be  a  jump  of  a;”(-),  asymptotically, 
into  the  interior  of  [0,  B],  which  is  impossible,  since  the  2/"(-)  can  change  only  on 
the  boundary.  Thus  the  asymptotic  continuity  assertion  holds  for  (a;”(-),  j/"(-)). 

Now,  take  a  weakly  convergent  subsequence  with  limit  (x(-),  to(-),  y(-),  w(-)), 
index  it  by  n  also,  and  use  the  Skorohod  representation  [3,  page  102],  so  that 
we  can  assume  that  the  convergence  is  w.p.l  in  the  topologies  of  the  spaces  of 
concern.  By  the  weak  convergence,  we  must  have  xft)  G  [0,i?],  and  that  j/i(-) 
(resp.,  2/2('))  can  change  only  at  t  where  xft)  =0  (resp.,  where  x(t)  =  B).  Since 
supg<(  la;"(s)  —  a:(s)l  — >  0  for  each  t  >  0,  a;”(t)  ^  x{t),  uniformly  on  any  finite 
interval,  all  w.p.l.  Then,  (A1.2)  implies  that,  for  all  v  G  [— r,  0], 


sup  j5(a;"(s),  a,  u)  —  5(a;(s),  a,  u)j 

S<t,Oc 


0 


w.p.l,  and  also  for  ct(-),  k{-)  replacing  b(-).  The  last  sentence  and  the  continuity 
and  boundedness  assumptions  (A  1.2)  yield 


&(a;”(s),  a,  v)m^{da,  ds  —  v) 


b{x{s),  a,  v)m{da,  ds  —  v) 


for  all  f  >  0,t>  €  [— r,  0],  w.p.l.  From  this  it  follows  that  the  first  integral  in 
(2.1)  converges  to  the  process  obtained  when  the  superscript  n  is  dropped. 
Nonanticipativity  is  shown  as  follows,  also  following  the  reference.  Let 
^  be  continuous  functions  with  compact  support  and  write 


/  /  gj{a,  s)m{da  ds) . 

Jo  Ju 


For  arbitrary  t  >  0  and  integer  /  >  0,  let  Si  <  t  for  i  <  I,  and  let  h{-)  be  an 
arbitrary  bounded  and  continuous  function.  By  the  nonanticipativity  for  each 
n, 


Eh{x^{si),w'^{si),y^{s^),  {m'^,gj)  {si),i  <  I,j  <  J)  x 


(u>”(t  +  T)  -u>”(t))  =  0. 


(2.2) 


By  the  weak  convergence  and  the  continuity  of  the  limit  processes,  (2.2)  holds 
with  the  superscript  n  dropped.  Now,  the  arbitrariness  of  h{-),I,J,Si,t,gj{-) 
implies  that  w{-)  is  a  martingale  with  respect  to  the  sigma-algebra  generated 
by  z{-),m{-)).  Hence  the  nonanticipativity  of  the  limit  processes. 

The  convergence  of  the  stochastic  integral  is  obtained  by  an  approximation 
argument.  For  a  measurable  function  /(•)  and  k  >  0,  let  /«(•)  be  the  ap¬ 
proximation  that  takes  the  value  /(uk)  on  [nn^riK  +  k).  Then,  by  the  weak 
convergence, 

[  <^{xZis))du’"‘{s)  [  a{x^{s))dw{s), 

Jo  Jo 

The  left  side  can  be  made  arbitrarily  close  to  the  stochastic  integral  in  (2.1), 
in  the  mean  square  sense,  uniformly  in  n,  by  choosing  n  small  enough.  By  the 
nonanticipativity,  as  k  ^  0  the  right  hand  term  converges  to  the  stochastic 
integral  with  x(-)  replacing  Xk{-)-  Finally,  by  the  weak  convergence  and  the 
minimizing  property  of  m"(-),  W(x,m^)  W{x,m}  =  V{x),  the  infimum  of 
the  costs.  ■ 


Theorem  2.2.  Assume  (A1.2)-(A1.3).  Let  admissible  (a;”(0), m"(0))  converge 
weakly  to  {x,m).  Then  y(a;”(0), m”(0))  ^  V(x,m). 

The  next  theorem  asserts  that  the  use  of  relaxed  controls  does  not  change 
the  minimal  values.  See  [10,  Theorem  10.1.2]  for  the  no-delay  case.  The  proof 
depends  on  the  fact  that  for  any  relaxed  control  to(-)  one  can  find  a  sequence 
of  ordinary  controls  ■u”(-),  each  taking  a  finite  number  of  values  in  U,  such  that 
{x,mA{-),w{-))  converges  weakly  to  (x,m(-),w(-))  where  m"(-)  is  the  relaxed 
control  representation  of  u"(-). 

Theorem  2.3.  Assume  (A1.2)-(A1.3).  Fix  the  initial  control  segment  Ui{9),  9  G 
[— T,  0],  and  let  mi(0)  he  the  relaxed  control  representation  of  this  ordinary  con¬ 
trol  segment  111(0).  Then 


inf  W{x,u)  =  inf  W{x,m), 

«(•)  m 
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where  the  inf„  (resp.,  inim)  is  over  all  controls  (resp.,  relaxed  controls)  with 
initial  segments  ■ui(O)  (resp.,  mi(0). 


3  A  Markov  Chain  Approximation:  The  No- 
Delay  Case 


Local  consistency.  In  this  section  we  review  the  basic  ideas  for  the  no-delay 
case  as  preparation  for  the  treatment  of  the  delay  case  in  the  next  section. 
Keep  in  mind  that  the  restriction  to  one  dimension  is  for  expository  simplicity 
only.  The  numerical  method  is  based  on  a  Markov  chain  approximation  to  the 
diffusion  [10].  For  an  approximation  parameter  h  >  0,  and  B  assumed  to  be  an 
integral  multiple  of  h,  discretize  the  interval  [0,  i?]  as  Gh  =  {0,  ■  ■  ■ ,  B  —  h,  B}. 
At  0  or  B,  the  process  (1.1)  or  (1.4)  is  still  a  diffusion  and  the  drift  or  stochastic 
integral  terms  might  try  to  force  it  out  of  [0,  i?].  But  then  the  reflection  process 
cancels  those  effects  and  prevents  exit.  For  the  approximation,  the  analog  is 
to  still  approximate  the  diffusion  at  0  and  B,  as  at  any  interior  point  of  Gt- 
But  if  the  chain  goes  from  0  to  —h  or  from  B  to  B  +  h,  it  will  be  immediately 
reflected  back  to  Gh-  The  full  state  space,  including  the  reflecting  boundary 
points,  is  denoted  by  G)(  =  {—h,Q,h,...,B,B  +  h}.  Let  be  a  finite  set 
such  that  the  Hausdorff  distance  between  and  U  goes  to  zero  as  h  ^  0.  Let 
{(,n,n  <  oo}  be  a  controlled  discrete  parameter  Markov  chain  on  the  discrete 
state  space  G))  with  transition  probabilities  denoted  hy  p^{x,y\a).  The  a  is  the 
control  parameter  and  takes  values  in  We  use  u!)  to  denote  the  random 
variable  which  is  the  actual  control  action  for  the  chain  at  discrete  time  n.  In 
addition,  suppose  that  we  have  an  “interpolation  interval”  At^{x,a)  such  that 
sup,j, At^{x,  a)  ^  0  as  h  ^  0,  but  At^{x,  a)  >  0  for  each  h  >  0  and  x  G  Gh- 
As  will  be  seen,  getting  such  an  interval  is  always  an  automatic  byproduct  of 
getting  the  transition  probabilities.  Define  At’)  =  At’^{£j),u’)). 

The  distribution  of  conditioned  on  <  n,  will  depend  only  on 

and  not  on  n  otherwise.  Thus,  let  denote  the  conditional  expec¬ 
tation  given  Uq  =  a,^Q  =  x.  Define  A())  =  and  the  martingale 

difference^ 


= 


aC-e\aC 


ii,u(,i<  n 


h 


The  key  condition  for  convergence  of  the  numerical  procedure  is  the  following 
“local  consistency”  condition  for  The  equalities  define  For 

^Here  and  in  the  sequel,  when  we  say  that  some  process  derived  from  the  chain  is  a 
martingale  or  martingale  difference,  the  relevant  filtration  is  that  generated  by  the  path  and 
control  data. 
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X  G  Gh  we  require: 


=  b^{x,  a)At^{x,  a)  =  h{x,  a)At^{x,  a)  +  o{At^{x,  a)), 

-E^’“/3g  [/^g]'  =  a^{x)At^{x,  a)  =  a{x)At^{x,  a)  +  o{At^{x,  a)), 
o(cr)  =  a{x)a'{x), 

sup^.n  IC«+l  -  ^n\  0>  sup^.a  At’^{x,  a)  ^  0. 

Set  p^{—h,  0|a)  =  p^{B  +  h,  B\a)  =  1  and  At^{—h,  a)  =  At^{B  +  h,a)  =  0,  so 
that  the  states  —h,B  +  h  are  instantaneously  reflecting. 


An  example  of  construction  of  the  transition  probabilities.  Straight¬ 
forward  and  automatic  procedures  for  getting  the  p^{-)  are  discussed  in  At^(-) 
[10].  To  facilitate  understanding  the  general  method  of  adapting  the  algorithms 
to  the  delay  case  in  the  next  section,  we  will  work  with  one  particular  simple 
approximation.  This  method  is  used  for  illustrative  purposes  only  and  any  of 
the  procedures  for  getting  the  approximating  chain  in  [10]  could  be  used  in¬ 
stead.  The  method  to  be  discussed  gets  suitable  approximations  whether  or  not 
the  W{-)  introduced  below  has  any  derivatives.  Its  use  of  finite  differences  is 
only  a  formal  device.  The  proofs  of  convergence  are  all  probabilistic,  somewhat 
analogous  to  that  of  Theorem  2.1.  Let  W{-)  be  a  purely  formal  solution  to  the 


PDE 


1 

2 


o'^Wxxi.x)  +  bWx{x)  +  k{x,  a)  =  0, 


where  114, (•)  and  Wxx{')  denote  the  first  and  second  derivatives  with  respect 
to  X.  Suppose  that  <y^{x)  >  h\b{x,a)\  for  all  a;, a,  and  use  the  finite  difference 
approximations 


fx{x) 


f{x  +  h)  +  f{x  -h)-  2f{x) 

f{x  +  h)  -  f{x-h) 

^  2h 


(3.2) 


For  X  G  Gh,  this  leads  to  the  finite  difference  approximation 


W^{x)  = 

p^(x,  X  +  h\a)W^{x  +  h)  +  p^{x,  x  —  h\a)W^{x  —  h)  +  At^{x,  a)k{x,  a), 


where 


p^{x,  X  ±  h\a) 


<j‘^{x)  ±  hb{x,  a) 
2a^{x) 


At^{x,  a) 


a^{xy 


X  G  Gh- 


(3.3) 


Condition  (3.1)  clearly  holds,  so  that  the  terms  in  (3.3)  can  serve  as  the  tran¬ 
sition  probabilities  and  interpolation  interval  for  the  approximating  chain. ^  In 

®If  <  h\b{x,a)\  at  some  x,a,  then  one-sided  difference  approximations  can  be  used 

there  to  get  the  appropriate  [10]. 
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the  delay  case,  the  correct  functional  dependence  of  b{-),a{-),k{-)  on  the  path 
segment  will  need  to  be  used. 


The  Bellman  equation.  After  getting  the  approximating  chain,  one  approx¬ 
imates  the  cost  and  writes  the  Bellman  equation,  which  is 


V^{x)  =  inf  <e  p'^{x,y\a)V^{y)  +  k{x,a)At^{x,a)}  ,  x  G  Gh, 

1  Y  i 

(3.4a) 

and,  for  the  boundary  states, 

V’^{-h)  =  V’^{Q)+cih,  V’^{B  +  h)  =  V’^{B)+c2h.  (3.4b) 


The  form  (3.4a)  shows  why  the  At^{x,a)  were  called  interpolation  intervals. 
The  procedure  just  used  might  be  referred  to  henceforth  as  the  “explicit”  method, 
in  analogy  to  the  procedure  of  the  same  name  based  on  the  approximation  (3.2) 
for  solving  parabolic  PDE’s  by  finite  differences.  A  special  case  of  the  results  in 
[10,  Chaper  10]  is  that  V^{x)  V{x),  the  minimal  cost,  as  /i  ^  0. 


A  continuous-time  approximating  Markov  process.  The  following  fact 
will  be  useful  later.  Suppose  that  the  were  replaced  by  a  continuous-time 
Markov  chain  ■i/'^(')  with  transition  probabilities  p^(x,j/|a)  and  whose  mean 
holding  times,  when  in  state  x  with  control  value  a.  used,  are  At^{x,  a).  Let  the 
cost  function  be  just  the  no-delay  form  of  (1.5).  Then  the  Bellman  equation  is 
still  (3.4),  modulo  an  asymptotically  negligible  difference  in  the  cost  rate  and 
discount  factor  [10,  Section  4.3].  Thus,  either  model  (^([  or  could  be  used 

to  study  the  convergence  of  the  numerical  procedure. 


Constant  interpolation  interval.  For  simplicity  of  coding  and  consider¬ 
ations  of  memory  requirements  in  the  next  section,  it  will  be  useful  to  have 
At^(-)  not  depending  on  the  state  or  control.  This  is  easily  arranged  and  the 
desired  transition  probabilities  and  interpolation  interval  are  readily  obtained 
from  the  p^(-),  A<^(-)  above,  as  follows.  Define  the  new  interpolation  interval 

A^  =  infa^jgG,^  At'‘(^,  a).  The  possibility  that  A^  <  At^{x,a)  at  some  x,a 
is  compensated  for  by  allowing  the  state  x  to  communicate  with  itself  at  that 
point.  Let  p^{x,y\a)  denote  the  new  transition  probabilities.  Conditioned  on 
the  event  that  a  state  does  not  communicate  with  itself  on  the  current  transi¬ 
tion,  the  transition  probabilities  are  as  in  (3.3).  Thus,  the  general  formula  for 
getting  them  from  the  p^(-)  is  ([10,  Section  7.7]) 

p'"{x,y\a)  =p'^{x,y\a){l-p^{x,x\a)),  for  Xy^y, 

(3.5) 

,a)' 


p^{x,  x\a.)  =  1  — 


a" 


At'^{x 


Continuous-time  interpolations.  The  proofs  of  convergence  in  [10]  depend 
on  continuous-time  interpolations  of  the  process.  The  simplest  interpolation. 
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called  ^^(•),  is  piecewise  constant  with  intervals  AtJj.  For  any  t,  the  interval 
[t,t)  is  considered  to  be  empty.  Define  define 

and  u^{t)  =  for  t  G  [tnZt+i)-  Suppose  that  =  —h  or  B  +  h,  a,  refiecting 
state.  Then  AtJj;  =  0  and  the  interval  +  AtJ^)  is  empty.  This  implies  the 

important  fact  that  the  values  of  the  reflecting  states  are  ignored  in  constructing 
the  continuous-time  interpolation.  This  will  always  be  the  case.  Let  rn^{-) 
denote  the  relaxed  control  representation  of  u^{-).  It  is  constant  on  the  intervals 

[tnZn+i)  derivative  on  that  interval  is  defined  by  m^'{A,t)  = 

Then,  by  (3.1), 

=C  +  uf)At>f  +  (3f  +  (3.6) 

where  hz’f  =  where  and  =  hl^^h^B+h}- 

For  each  t  >  0,  define  the  stopping  time,  where  ^  =  0, 

{n—1 

n  :  ^  At’l  =  Zf<t 
i=0 

Note  that  d^ff)  will  never  be  the  index  of  a  refiecting  state,  since  the  time 
intervals  for  those  are  zero.  Then 

^'‘(t)  =  ^'‘(O)  +  X]  h\f,tu':)At'f  +  B\t)+z'l{t),  (3.7) 

i=0 

where 

B\t)=  ^  /3^  z’lif) 

In  interpolated  and  relaxed  control  form, 

^'*(t)  =  a;(0)  +  f  f  b^{f^{s),a)m^{dads)  B’^{t)  z^{t). 

Jo  Ju^ 

Although  the  fact  will  not  be  used  in  the  sequel,  it  is  interesting  to  note  that 
B^{-)  is  an  “approximation  to  a  stochastic  integral  in  the  following  sense.  There 
are  martingale  differences  whose  continuous-time  interpolation  (intervals 
AtZ)  converges  weakly  to  a  standard  Wiener  process  and  /?()  «  (j{£Zi)dw^  in  the 
sense  that  [7,  Section  6.6] 

E  (^{^n)bWn  +  asymptotically  negligible  error. 

n—1  n=l 

There  is  another  continuous-time  interpolation  that  will  be  important.  Re¬ 
call  the  comment  in  the  paragraph  below  (3.4)  concerning  the  equivalence  of  the 
Bellman  equations  for  the  sequence  and  the  continuous-time  Markov  process 
In  the  proofs  of  convergence  in  [10,  Chapter  10],  this  latter  process  was 


=  E 

z=0 
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used  since  it  simplified  the  proof.  It  will  also  be  used  here  for  that  purpose.  For 
the  delay  case,  both  interpolations  and  play  a  role  due  to  the  presence 
of  the  segment  x{t)  in  the  dynamics  and  the  need  to  approximate  it  in  the  dy¬ 
namics  of  the  approximating  chain.  Let  us  now  formalize  the  definition  of 
Let  be  i.i.d.  exponentially  distributed  variables  with  unit  mean,  and  inde¬ 
pendent  of  u'^}.  Define  the  intervals  At^  =  v^At^.  Then  denotes  the 
continuous-time  interpolation  of  the  sequence  with  intervals  {At!^}.  Let 
u^(-)  denote  the  continuous-time  interpolation  of  the  controls  {u^}  with  inter¬ 
vals  {Ar^},  and  let  m^(-)  be  its  relaxed  control  representation.®  Analogously, 
let  M^(-)  (resp.,  be  the  continuous-time  interpolation  of  {/3^} 

(resp.,  of  {6z^,6yU)  with  intervals  {At,^}.  Write  y^{-)  =  2/2.v-(’))  and 

=  yti,{-)  -  y^A'^- 

By  the  definition,  is  a  continuous-time  controlled  Markov  chain.  We 

have,  where  now  all  processes  are  interpolated  with  the  intervals  Arj^, 

{t)  =  x{Q)  +  f  f  {s) ,  a)'m^{da  ds)  +  {t)  +  z^{t) .  (3.8) 

Jo  Juf' 

The  process  M^(-)  can  be  represented  as  [10,  Section  10.4.1  ] 

M^{t)=  f  (j'^{^p'^{s))dw'^{s)=  f  u{xP^{s))dw'^{s)  +  i^{t), 

Jo  Jo 

where  e^(-)  converges  weakly  to  the  zero  process  and  w^{-)  is  a  martingale  with 
quadratic  variation  It  and  which  converges  weakly  to  a  standard  Wiener  process. 
In  the  proofs  in  [10,  Chapter  10]  it  is  shown  that  converges  to  an  optimal 
limit  process,  with  optimal  cost  V{x)  =  Yim.hV'^  {x) .  The  analog  of  these  facts 
will  also  be  true  when  there  are  delays. 

The  interpolations  and  are  asymptotically  identical.  They  are 

both  continuous-time  scalings  of  the  basic  chain  and  the  scalings  are  asymp¬ 
totically  identical.  This  fact  was  not  needed  in  the  proof  of  the  classical  no-delay 
case,  but  will  be  important  in  treating  the  delay  case.  The  following  theorem 
holds  for  the  delay  case  in  the  next  section  as  well. 

Theorem  3.1.  Recall  the  definition  of  d^{-)  below  (3.6).  Then,  for  each  t  >  0, 


>(S) 


Proof.  Owing  to  the  mutual  independence  of  the  exponential  random  variables 
{izlf}  and  their  independence  of  everything  else,  the  discrete  parameter  process 

®In  [10],  these  were  called  just  u^(-)  and  rrA{  ).  But  for  the  delay  case  both  interpolations 
^^(•)  and  ip^f)  are  used  and  we  need  to  distinguish  them. 
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Ln  =  ~  is  a  martingale.  Hence,  the  conditional  expectation  of 

the  squared  term  in  (3.9)  given  the  equals 

i  <oo 

which  yields  the  desired  result  since  if  sup^<;„  \Lj\^  <  4if|L„p.  ■ 


^  [At'*]^  <  (t  +  sup  At(()  sup  At((  ^  0, 

.  „  n  n 


The  “implicit”  approximation.  An  alternative  to  the  method  of  obtain¬ 
ing  a  Markov  chain  approximation  that  was  illustrated  by  the  use  of  (3.2)  uses 
what  has  been  called  an  “implicit”  method  of  approximation  [10,  Chapter  12], 
again  owing  to  its  similarity  to  the  implicit  method  of  dealing  with  numeri¬ 
cal  solutions  of  parabolic  PDE’s  via  finite  differences,  although  it  is  used  here 
only  as  a  heuristic  guide  to  the  construction  and  no  assumption  concerning 
differentiability  is  made.  The  fundamental  difference  between  the  explicit  and 
implicit  approaches  to  the  Markov  chain  approximation  lies  in  the  fact  that  in 
the  former  the  time  variable  is  treated  differently  than  the  state  variables:  It 
is  a  true  “time”  variable,  and  its  value  increases  by  either  (depending  on  the 
interpolation  used)  At])  or  at  step  n.  In  the  implicit  approach,  the  time 

variable  is  treated  as  just  another  state  variable.  It  is  discretized  in  the  same 
manner  as  are  the  other  state  variables:  For  the  no-delay  case,  the  approximat¬ 
ing  Markov  chain  has  a  state  space  that  is  a  discretization  of  the  (x,t)— space, 
and  the  component  of  the  state  of  the  chain  that  comes  from  the  original  time 
variable  does  not  necessarily  increase  its  value  at  each  step.  The  idea  is  anal¬ 
ogous  when  there  are  delays,  and  leads  to  some  interesting  and  possibly  more 
efficient  numerical  schemes.  Let  ^  >  0  be  the  discretization  level  for  the  time 
variable.  For  the  no-delay  case,  the  “implicit  procedure”  analog  of  the  transi¬ 
tion  probabilities  of  (3.3),  obtained  via  use  of  (3.2),  would  start  with  the  finite 
difference  approximations  of  the  form 


f{x,t  +  6)  -  fix,t) 

6 


fx{x,t) 


f{x  +  h,t)-  f{x-h,t) 
2h 


f  f{x  +  h,t)  +  f{x-h,t)-2f{x,t) 

Jxx[X,l)  >  ^2 

Note  that  the  approximations  for  fx  and  fxx  are  made  at  t  and  not  t+6.  Denote 
the  chain  by  (time,  space)  variables. 


The  general  implicit  method.  Analogously  to  the  method  of  going  from  the 
formal  approximation  to  the  PDF  above  (3.2)  to  (3.3),  the  transition  probabili¬ 
ties  and  interpolation  interval  can  be  determined  by  substituting  (3.10)  into  the 

^Again,  if  cr'^{x)  <  h\b{x,  o)|  at  some  x,  a,  then  one-sided  difference  approximations  can  be 
used  there  to  get  the  appropriate  At^(-)  [10]. 
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PDE  Wt{x,t)  +  [a'^ /2]Wxx{x,t)  +  bWx{x,t)  +  k{x,t)  =  0  and  collecting  coeffi¬ 
cients.  But  there  is  a  general  method  that  starts  with  the  {■) ,  At^ {■) .  Suppose 
that  at  the  current  step  the  time  variable  does  not  advance.  Then,  conditioned 
on  this  event  and  on  the  value  of  the  current  spatial  state,  the  distribution  of 
the  next  spatial  state  is  just  the  p^{x,y\a)  used  previously.  So  one  needs  only 
determine  the  conditional  probability  that  the  time  variable  advances,  condi¬ 
tioned  on  the  current  state.  This  is  obtained  by  a  “local  consistency”  argument 
and  no  matter  how  the  p^{-)  were  derived,  the  (no-delay)  transition  probabili¬ 
ties  and  interpolation  interval  for  the  implicit  procedure  can  be 

determined  from  the  p^(-),  At^(-)  by  the  formulas  [10,  Section  12.4],  for  x  G  Gh, 


ht  I  ^  p^'^{x,n6]y,n6\a) 

P  {x,y\a)  =  - - T-jT - ^ - FTTTA’ 

1  —  p^’°{x,  nO]  X,  no  +  oja) 

ft/)/  c  c  ri  /  At^(x,a) 
p^'^{x,n6-,x,n6  +  6\a)  =  , 

At'^(x,  a)  -I-  0 


AG’°{x,  a)  = 


6At^{x,  a) 
At^{x,  a)  -I-  5 


(3.11) 


These  formulas  hold  provided  only  that  no  state  communicates  with  itself  under 
the  p^{-)-  The  reflecting  states  x  =  —h  and  B  +  h  are  treated  as  before.  In 
the  no-delay  case,  the  implicit  procedure  was  used  in  [10]  mainly  to  deal  with 
control  problems  that  were  defined  over  a  fixed  finite  time  interval.  It  will  be 
used  in  a  quite  different  way  in  the  delay  case. 


4  The  System  With  Delays:  Consistency  and 
Convergence 


Local  consistency  conditions;  delay  in  path  only.  The  approach  is  analo¬ 
gous  to  what  was  done  for  the  no-delay  case.  The  main  issues  concern  accounting 
for  the  fact  that  and  k{-)  depend  on  the  solution  path  over  an  interval 

of  length  r.  We  will  construct  a  controlled  process  ^n,n  >  0,  and  interpolation 
intervals  AtJ],  n  >  0,  in  much  the  same  way  as  was  done  in  Section  3.  Details  of 
a  construction  analogous  to  that  of  (3.2)  and  (3.3)  are  in  the  next  section.  The 
initial  condition  x(0)  for  (1.1)  is  an  arbitrary  continuous  function.  The  numerics 
work  on  a  discrete  space,  so  this  function  will  have  to  be  approximated.  The 
exact  form  of  the  approximation  is  not  important  at  this  point,  and  we  simply 
assume  that  we  use  a  sequence  G  D[Gh',  —t,  0],  that  is  piecewise  constant  and 
that  converges  to  a;(0)  uniformly  on  [— r,  0]. 

Given  the  AtJ],  define  Define  ^^(O  such  that  on  [0,  oo), 

it  is  the  continuous-time  interpolation  of  with  intervals  {At])},  as  in  Section 
3,  and  the  segment  on  [— r, 0]  is  ^q,  where  =  ^q{0).  Define 

+  for  6»  G  [-T,0], 
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and  set  ^^(•)  =  •),  the  interpolated  path  on  — r,  tjj].  We  will  also  write 

=  ?n(')-  Note  that  ^n{-)  is  discontinuous  at  t  =  0.  The  value  there  will  be 
if  that  is  in  Gh-  If  is  a  reflecting  state,  then  the  value  is  the  closest  one 
on  Gh-  We  need  to  deflne  a  path  segment  that  plays  the  role  of  x{t).  There 
is  some  flexibility  in  the  way  that  this  approximation  is  constructed  from  the 
{Cn}-  TIi6  choice  influences  the  computational  complexity,  and  we  return  to 
this  issue  later.  Until  further  notice,  we  use 

Analogously  to  the  no-delay  case,  the  chain  and  intervals  are  assumed  to 
satisfy  the  following  properties  (an  example  is  given  below) .  There  is  a  function 
At^(-)  on  D[Gh;  — T,  0]  xU’^  such  that  where  uj)  is  the  control 

applied  at  time  n.  Recall  the  deflnitions  of  A^((  and  of  above  (3.1).  The 
distribution  of  given  the  initial  data  and  <  n,  will  depend  only 

on  not  otherwise  on  n,  analogously  to  the  case  in  Section  3.  Thus, 

let  denote  the  expectation  under  control  value  Uq  =  a  and  =  |  G 

D[Gh',  —T,  0]  and  is  piecewise  constant.  Analogously  to  the  no-delay  case,  local 
consistency  for  At^  is  said  to  hold  if,  for  =  ^(0),  ((4.1)  deflnes  6^(-),  a^(-)) 

a|’“A^o"  =  b\i,  a)At\i  a)  =  6(|,  a)At\i  a)  +  o{At\i  a)), 

=  a\0At\i,a)  =  a{i)At\ia)  +  o{AtHia)), 

.  .  (4-1) 

a{i)  =  CT(C)cr'(C), 


sup„,,,  I  0,  SUP|_^  At’^d  u)-^0. 

The  reflecting  boundary  is  treated  exactly  as  for  the  no-delay  case  below  (3.1). 
In  particular,  if  =  —h  (resp.,  B  -j-  h),  then  =  0  (resp.,  B)  and  the 

interpolation  interval  is  zero. 

Let  denote  the  expectation  given  initial  condition  ^  and  control 

sequence  =  {u^,n  <  oo}.  The  cost  function  for  the  chain  is 


n=0 


[k{^tut)AtH^tO+c'Sy. 


h' 

n\  ’ 


(4.2) 


U'‘(^)  =infW'*(^,M'*). 


Let  y^{-)  denote  the  continuous-time  interpolation  of  {by^}  with  intervals  {At((}. 
By  [10,  Theorem  11.1.3],  for  any  T  <  oo 

limsupsupA  |y^(t -I- T)  —  y^(t)|^  <  oo,  sup  A  |j/(t -|- T)  —  i/(t)|^  <  oo.  (4.3) 

h  t  t 

This  implies  that  the  costs  are  well  defined.  Recall  the  deflnitions  of  the  inter¬ 
polations  ■0^(-),  u^(-),  M^(-),  m();(-)  and  of  d^{s)  =  max{n  :  tj)  <  s}  in  Section 
3.  Deflne  d-r{s)  =  max{n  '■  <  s}  and  set  g(l(s)  =  t'^h(^gy  For 

^0  =  ^0  (0)>  we  have 

V'"(t)  =  +  r  /  b\e{qr{s)),a)m'^{dads)  +  M\t)  +  (4.4) 

Jo  JW' 
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where  M^(-)  is  a  martingale  with  quadratic  variation  process 

Jo 

Modulo  an  asymptotically  negligible  error  due  to  the  “continuous  time”  approx¬ 
imation  of  the  discount  factor,  the  cost  function  (4.2)  can  be  written  as 

[  f  [k{^'^{q'^{s)),a)m'^{dads)  +  c'dy'^is)]  .  {4.5) 
^  Jo  Jw^ 


The  transition  probabilities.  The  approach  of  the  simple  example  in  Section 
3  or,  indeed,  of  any  of  the  methods  in  [10]  for  obtaining  the  transition  proba¬ 
bilities  for  the  no-delay  case  can  be  readily  adapted  to  the  delay  case.  In  all 
cases  in  [10],  the  transition  probability  for  the  chain  in  the  no-delay  case  can  be 
represented  as  a  ratio;  i.e.,  for  x  G  Gh, 

=  y\^o  =  x,  =  a}  =  p^{x,  y\a)  =  N’^{x,  y,  a)/D^{x,  a), 
and  At^(x,a)  =  h? / D^{x,a),  where  N^{-),  D^{-)  are  functions  of  b{-),a‘^{-): 

N^{x,  y,  a)  =  N^{b{x,  a),a'^{x),y),  D^{x,  a)  =  D’^{b{x,  a),  cr^(x)). 


For  the  delay  case,  for  the  same  ratios,  simply  use  the  forms 


P{^i  =  y\^o  =  14  =  «} 


N'^(b(i,a),cr^(i),y) 

B'^(b(i,a),a^(i)) 


(4.6) 


Consider,  in  particular,  the  delay  case  analog  of  the  approach  that  led  to 
(3.3).  Suppose  that  cr^(|)  >  /i|6(^, a)|.  Write  |  =  ^q,^o  =  Co  =  C(0)-  and 
ph,a^^h  _  yg.  _  analog  for  the  delay 

case  is  (which  defines  {■) ,  {■)) 


p-’44  =  ^o±h} 


ct2(^)  ±  hb{i,  a)  N^{b{i,  a),  <7^(1),  ±  h) 


2aH0 

At"(C,a)  = 


£>'^(5(1,  a),  0-2(1)) 


(4.7) 


The  cost  rate  becomes  k{^,a).  If  a{-)  is  a  constant,  then  the  intervals  AtJ]  are 
all  h?' I (j^ .  The  following  assumption  obviously  holds  for  our  special  example.  It 
is  unrestrictive  in  general. 


A4.1.  The  transition  probabilities  and  interpolation  intervals  are  given  in  the 
form  (4.6). 

The  proof  of  the  next  two  theorems  for  the  convergence  of  the  numerical  method 
is  in  Section  8. 
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Theorem  4.1.  Let  be  locally  consistent  with  the  model  (1.1),  whose 

initial  condition  is  x(0),  a  continuous  function.  Let  G  D[Gh',  —t,0]  be  any 
piecewise  constant  sequence  that  converges  to  a;(0)  uniformly  on  [— r,  0].  Assume 
(Al.l),  (A1.3),  and  (A4.1).  Then  l"(x(0)). 

Delay  in  the  path  and  control.  As  for  the  case  where  only  the  path  is 
delayed,  one  constructs  a  chain  fn,n>0,  and  interpolation  intervals  At’f,  n>0. 
The  initial  data  for  (1.4)  is  a;(0)  =  x  and  u(0)  €  L2[U;  —t,0].  The  initial 
control  data  for  the  chain  is  slightly  different.  For  the  process  (1.4),  either 
■u(s),s  G  [— T,  0]  or  u{s),s  G  [— t,  0)  will  do  for  the  initial  control  data.  But  for 
the  chain,  the  control  Uq  is  to  be  determined  at  time  0,  and  should  not  be  given 
as  part  of  the  initial  data.  This  fact  accounts  for  some  of  the  definitions  below. 

Let  be  as  above  Theorem  4.1.  Let  Uq  be  any  piecewise 

constant  sequence  in  — r,  0]  whose  intervals  are  those  of  the  ^q,  and  that 

converges  to  u(0)  in  the  L2-sense.  Let  u^(-)  denote  the  function  on  [— r,  oo)  that 
equals  Uq  on  [— r,  0),  and  on  [0,  oo)  equals  the  continuous-time  interpolation  of 
the  u^  with  intervals  At^.  Let  u^{s),s  G  [— t,  0],  denote  the  segment  of  u^{-) 
on  [t'f  —  T,tn]'  -I-  0),  0  G  [— r,  0].  Let  idf  denote  the  segment  on 

the  half  open  interval  [t'f  —  T,tn)- 

Recall  the  definition  of  b{-)  in  (A1.2).  The  distribution  of  given  the 
initial  data  and  fh,u^,i  <  n,  depends  only  on  ^nd  not  on  n  otherwise. 

Thus,  let  denote  the  expectation  given  =  Uq.  The  local  con¬ 

sistency  condition  for  the  chain  and  interpolation  intervals  is  that  there  exists 
a  function  At^(-)  such  that  At'f  =  At^{^^,v!f)  and 

=  h\i,uo)At'^{iue>)  =  h{iue,)At'-{iuo)+o{At\iue>)), 

=  a^aAt"(e,wo)  =  a{i)At\tuo)  +  o(At^(|,  wq)), 

.  ,  (4.8) 

a(C)  =  o-(I)ct'(C). 

sup„,^  iCn+i  SUP|  ^^  At^{i,u)  0. 

The  reflecting  boundary  is  treated  exactly  as  it  was  when  only  the  path  is 
delayed. 

Now  extend  the  definition  of  w^(-)  to  the  interval  [— r,  oo)  by  letting  it  equal 
Uq(s)  for  s  G  [— r, 0),  and  let  rn^{-)  denote  the  relaxed  control  representation  of 
■u^(-).  For  =  ^0  (0))  (4.4)  is  replaced  by 

f  [  h’^{f^{q^{s)),a,v)m^{da,ds  +  v)  pL{dv)+M^{t)+z^{t). 

Jo  Jw^  J 

(4.9) 

Let  u  denote  the  restriction  of  the  canonical  value  uq  to  the  half  open  interval 
[— T,  0) .  Let  denote  the  expectation  under  initial  data  Co  =  C  and  control 

sequence  n  >  0},  with  initial  control  segment  (on  [— r,  0))  u.  Recalling 


=  Co"+ J_ 
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the  definition  of  fc(-)  in  A1.2,  the  cost  function  can  be  written  as 

OO 

W^ii  u,  ^  [m,  4)  +  c6yt] 


n—Q 

V^itu)  =  inf  W^{i,u,v!^), 

{“nf 


(4.10) 


In  integral  and  relaxed  control  form,  and  modulo  an  asymptotically  negligible 
error  due  to  the  approximation  of  the  discount  factor,  (4.10)  equals 

W^{iu,u^)  = 

^hu,u  f  f  f  e~>^*[b'^{^'^{q'^{t)),a,v)m'^{da,dt-v)+c'dy'^{t)]y{dv) 

^  J-T  Uo 

(4.10) 

In  general,  the  transition  probabilities  are  given  by  a  ratio  as  in  (4.6)  and  we 
formalize  this  as  follows. 


A4.2.  The  transition  probabilities  and  interpolation  intervals  are  given  in  the 
form  (4.6),  where  h{-)  replaces  &(•). 

Theorem  4.2.  Let  AtJ)  he  locally  consistent  with  (1.4),  which  has  the  initial 
data  a;(0),  a  continuous  function,  and  u(0)  €  L2[U;  —t,  0].  Let  G  L>[Gfi;  —t,  0] 
he  piecewise  constant,  and  converge  to  a;(0)  uniformly  on  [— r,  0].  Let  Uq  G 
T, 0]  he  piecewise  constant  with  the  same  intervals  as  have  values 
Uo(0)  G  ,  and  converge  to  m(0)  in  the  sense  of  L2.  Let  ii^  denote  the  segment 
of  Uq  on  [— r,0).  Assume  (A1.2),  (A1.3)  and  (A4.2).  Then  V^{fQ,vA) 


5  Computational  Procedures 


The  Bellman  equation.  Path  only  delayed.  Let  =  inf^  I  At^d,  a), 

where  a  G  G  D[Gh',—T,0],  and  suppose  (w.l.o.g.)  that  r/A^  =  is  an 

integer.  The  interpolated  time  interval  [t^  —  r,  t^]  is  covered  by  at  most 
intervals  of  length  A^.  The  can  be  represented  in  terms  of  a  finite  state 
Markov  process  as  follows.  Recall  that  the  reflection  states  do  not  appear  in  the 
construction  of  ^^(O-  Let  fni,i  >  0,  denote  the  ith  nonrefiection  state  before 
time  n,  and  At'f  ^  the  associated  interpolation  interval.  We  can  represent 
in  terms  of  A^^^^), . . . ,  (^^i- Cnl-  This  new  representation  is 

clearly  a  {2K^  +  l)-dimensional  controlled  Markov  chain.  Let  f  be  an  arbitrary 
element  of  D[Gh;  — t,  0]  that  is  piecewise  constant  and  ^(0)  =  ^o-  Then,  if  there 
are  no  delays  in  the  control  the  Bellman  equation  for  the  process  defined  by 
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either  this  chain  or  (4.9)  with  cost  (4.2)  can  be  written  as 


= 

inf  6  ±  hjV’^iy^)  +  fc(|,  a) a)]  , 

aew'‘  L  “  « 

The  terms  denote  the  functions  on  [— r,  0]  with  values 
y±(0)  =  i{9  +  At\ia)),  -t<6< 

y^{0)  =  Co,  <0<O,  y±(0)  =^o±h. 

If  C(0)  =  Co  =  —h,  and  the  other  values  of  C(-)  are  in  Gh,  then,  At^(C,  a)  =  0 
and 

V’^{0  =  V>^iy+)  +  cih.  (5.1b) 

If  C(0)  =  Co  =  -B  +  h,  and  the  other  values  of  C(-)  are  in  Gh,  then,  At^(C,  a)  =  0 
and 

V\0  =  v\r)+C2h.  (5.1c) 

Owing  to  the  contraction  due  to  the  discounting,  there  is  a  unique  solution  to 
(5.1). 


Simplifying  the  state  representation.  Path  only  delayed.  If  the  in¬ 
terpolation  interval  At^{^,a)  is  not  constant,  then  the  construction  of  the  C^ 
requires  that  we  keep  a  record  of  the  values  of  both  the  At^,  for  the  indices  i 
that  contribute  to  the  construction.  The  use  of  constant  interpolation  intervals 
simplifies  this  problem.  Suppose  that  the  intervals  are  constant  with  value  A  ). 
It  is  then  apparent  from  the  form  of  the  Bellman  equation  (5.1)  that  the  state 
space  for  the  control  problem  for  the  approximating  chain  consists  of  functions 
C(-)  that  are  constant  on  [— r  -I-  iA  ,  — r  -I-  zA  -f  A  ),  z  <  M,  with  values  in  Gh 
there  and  with  C(0)  €  G^.  In  addition,  C^  is  a  piecewise  constant  interpolation 
of  the  +  1  values  C((  =  (C^  ,  ’  ’  ’ ,  Cn.i ,  Cn)  can  identify  Cn  with  this 

vector  without  ambiguity.  If  C((  €  Gh,  then  C((+i  =  (C^,ic^_i,  ■  •  ■ ,  C^(.i,  C^ ,  Cn-ei)- 
If  C^  =  -h,  then  C((+i  =  (C^,icfc,  ’  ’  ’  ,Cn,i,0),  and  analogously  li  =  B  +  h. 
Thus  the  full  state  vector  is  {K^  +  l)-dimensional  and  the  maximum  number  of 
possible  values  can  be  very  large,  up  to  {B/h  +  ‘i){B/h+  1)*-  . 

The  analog  of  the  procedure  (3.5)  for  getting  an  approximating  chain  with 
a  constant  interpolation  interval  is  obvious.  Let  P  denote  the  transition  prob¬ 
abilities  for  the  constant  interpolation  interval  case.  For  the  delay  case  with 
1(0)  =  Co  =  Co  G  Gh,  and  +  {±h,  0},  use 


=  Co  ±  /i}  =  ’“{Cl"  =  Co  ±  h}{l  - 


‘{Ci"  =  Co}), 


*{Ci"  =  Co}  =  1  - 


A<^(C, 
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Simplification  of  the  state  space.  We  only  need  to  keep  track  of  and  the 
differences  between  successive  components  of  This  gives  the  representation 

In  =  -,0^,1,  where,  for  1  <  i  <  M,,,  (5.3) 

and  i  =  in  —  Cn,i-  Suppose  now  that  cr^(-)  is  constant  and  non-zero  and 
that  (4.7)  is  used,  so  that  At^{i,a)  =  is  constant.  Then  the  j  take 

at  most  two  values  and  the  number  of  values  in  the  state  space  is  reduced  to 
{B/h  +  ?>)2^  .  The  two  values  and  the  reconstruction  of  the  j  from  them 
are  easily  determined  by  an  iterative  procedure.  For  example,  if  =  —h,  then 
^n,i  =  0.  If  =  0  then  G  {0,  h}.  If  is  not  a  reffecting  or  boundary 
value  then  =  i^,  ±h.  If  =  0,  then  G  {0,  h}.  If  is  not 

a  boundary  value  then  G  {—h,  h},  and  so  forth. 

If  cr^(-)  is  not  constant,  then  use  the  form  (5.2)  to  get  a  chain  with  a  constant 
interval.  Now  —  Cn  G  0;  h},  each  of  the  j  can  take  as  much  as  three 
values  and  we  have  at  most  {B/h+5)5^'^  values  in  the  state  space.  The  approach 
in  the  next  section  uses  fewer  intervals  to  cover  [— r,  0]  and  has  the  promise  of 
being  more  efficient  in  terms  of  memory  requirements. 


Delay  in  path  and  control.  Now  suppose  that  both  the  control  and  the 
path  are  delayed.  The  memory  state  at  time  n  for  the  discrete-time  dynamic 
program  is  in,u'^,  the  value  of  i^{-)  on  the  closed  interval  —  r,  together 
with  the  path  of  the  interpolated  control  (intervals  At(()  on  the  half  open  interval 

—  T,t^).  The  canonical  value  of  itg  is  u.  The  transition  probabilities  at  time 
n  depend  on  the  memory  variables  and  the  new  control  value  Thus,  write 
(it,  a)  for  the  canonical  value  of  the  control  on  [— r,  0],  where  a  denotes  the  value 
at  time  0.  We  can  then  use  terms  such  as  b(i,u,a)  without  ambiguity.  The 
form  (4.6)  still  applies,  with  b{i,u,a)  used  in  lieu  of  b{i,a). 

Analogously  to  what  was  done  at  the  beginning  of  the  section  for  the  case 
where  the  control  is  not  delayed,  the  memory  variables  can  be  imbedded  into  a 
Markov  process,  with  values  at  time  n 


u 


h 

n,Kl^ 


At 


h 


),■■■,  {in, I 


Then  for  the  transition  probabilities  that  are  the  analogs  of  those  in  (4.7)  for 
the  present  case,  the  analog  of  (5.1a)  is 


v\i,u)  = 


inf 

aew'* 


^-l3AA{i,u,a)  ph,i 

± 


Co  ±  h}V^(y^,Ua)  +  k(i,  it,  a)At^(i,  it,  a)  , 


(5.4) 

where  the  following  definitions  are  used.  P~  denotes  the  transition  probability 
with  memory  variables  i,  u  and  new  control  value  a  used.  The  denotes  the 
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new  “path  memory  sections”  defined  below  (5.1a),  with  u,  a)  used  in  lieu 

of  The  new  “control  memory  section”  Ua  is  defined  by 

Ua{0)  =  u{9  +  u,  a)),  — T  <  0  <  u,  a), 

Ua{0)  =  Q;,  —At^{^,u,  a)  <  0  <  0. 

The  refiecting  states  are  treated  as  for  the  no-delay  case.  Because  of  the  con¬ 
traction  due  to  the  discounting  there  is  a  unique  solution  to  (5.4). 

We  can  use  the  more  efficient  representation  (5.3)  for  the  path  variable.  But 
the  total  memory  requirements  might  still  be  too  large,  unless  itself  can  be 
effectively  “approximated”  by  only  a  few  values. 

A  comment  on  higher-dimensional  problems.  We  have  concentrated  on 
one-dimensional  models.  But  the  ideas  concerning  approximation  and  the  con¬ 
vergence  results  all  extend  to  quite  general  higher-dimensional  problems.  The 
solution  to  the  reflected  diffusion  is  confined  to  a  compact  region  G  by  a  bound¬ 
ary  reflection.  The  conditions  on  G  and  the  reflection  directions  are  exactly  as 
in  [10]  and  there  is  no  need  to  say  more.  The  Markov  chain  approximations 
for  the  delay  problem  in  higher  dimensions  adapts  the  methods  of  the  reference 
in  the  same  way  as  was  done  for  the  one-dimensional  problem  considered  in 
this  paper,  using  the  comments  of  this  paper  as  a  guide  to  the  substitutions. 
For  the  no-delay  problem,  the  required  memory  grows  rapidly  as  the  dimension 
increases,  and  that  also  holds  here.  Two-dimensional  problems  are  feasible  at 
present. 

Representations  analogous  to  (5.3)  can  also  be  used  for  the  higher-dimensional 
problem.  Consider  a  two-dimensional  problem  in  a  box  [0,i?i]  x  [0,  R2])  with 
the  same  path  delay  in  each  coordinate,  no  control  delay,  and  discretization 
level  h  in  each  coordinate.  The  in  (5.3)  is  replaced  by  vector  containing  the 
current  two-dimensional  value  of  the  chain.  The  difference  Ci  =  i  — 
now  a  two  dimensional  vector.  The  values  can  be  computed  iteratively,  as  for 
the  one-dimensional  case,  but  the  details  will  not  be  presented  here. 

6  The  Implicit  Approximation:  Path  Only  De¬ 
layed 

Let  ^  >  0  with  h‘^/6  ^  0  as  h  ^  0,6  ^  0,  and  suppose  that  t/6  =  Q  is  an 
integer.  The  process  =  {4’n’^j^n’^)  =  (temporal  variable,  spatial  variable) 
whose  transition  probabilities  were  defined  by  (3.11)  leads  to  some  intriguing 
possibilities  for  efficient  representation  of  the  memory  data  for  the  delay  prob¬ 
lem.  Recall  that  either  the  spatial  variable  changed  or  the  time  variable 
advanced  at  each  iteration,  but  not  both.  We  will  construct  the  analog 
of  for  the  delay  case.  There  are  several  choices  for  the  time  scale  of  the 
continuous-time  interpolations.  One  can  use  the  analog  of  the  At^’^  defined  in 
(3.11),  and  proceed  as  in  the  last  section.  Another  possibility,  which  we  will 
pursue,  is  to  let  the  value  of  determine  the  interpolation.  More  precisely. 
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at  the  nth  step  the  interpolated  time  would  be  and  not  Then  the  time 
variable  for  the  interpolation  does  not  necessarily  advance  at  each  step  of  the 
chain.  It  will  be  seen  that  both  interpolations  are  the  same  asymptotically. 

Suppose  that  denotes  the  part  of  the  path  that  represents  the  memory 
state  at  iterate  n  for  the  chain.  It  will  be  defined  precisely  after  writing  the 
transition  probabilities.  Now,  adapting  the  procedure  that  led  to  (3.11)  to  the 
delay  case  yields  the  transition  probabilities  and  interpolation  intervals  for  the 
Cn^  =  {4’nJ^n^)  process  in  terms  of  those  for  the  process  as: 


Pi  /^h,6  ^  I  L I  ^h,5  p  ih,5  1  ^h,6  ^  h,6 

—  SO  i  ^|so  —  ^0  —  ^0;  SO  —  SO;  ^0 

=  4^0  +  =  00.  =  n}  = 


=  a| 

=  ^0  +  ) 

^  At\ia) 
a)  +  6 


(6.1) 


Define 


At"’^(|,a) 


a) 

At^{^,  a)  +  b 


n— 1 

E 

i=0 


Ai: 


h^8 


(6.2) 


Interpolations.  One  could  base  the  continuous-time  interpolation  used  to 
get  on  the  intervals  At!^’^.  But  then  the  issues  concerning  the  number 
of  required  values  of  the  memory  variable  would  be  similar  to  those  of  the 
last  section.  Consider  the  alternative  where  the  time  variables  determine 
interpolated  time,  in  that  real  (i.e.,  interpolated)  time  advances  (by  an  amount 
b)  only  when  the  time  variable  is  incremented  and  it  does  not  advance  otherwise. 
To  make  this  precise,  consider  at  only  the  times  that  changes.  Define 
/ig’^  =  0,  and,  for  n  >  0,  set 

h.6  •  r  r  •  h,6  ih,6  ih,6  c-] 

^J-n  =  infji  >  <(>Ai  = 


Define  the  “memory”  path  segment  ^n^{d),6  €  [— r, 0],  as  follows.  For  any  I 
and  n  satisfying  /i^’  <n<  set 

^■'(0)  = 

^n'\0)=tL,  0&[-b,Q), 

(6.3) 

in’\S)=tt  ,  9e[-T,-T  +  b). 

^l-Q  +  1 
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Let  I  denote  the  canonical  value  of  ^g’^.  It  can  be  represented  as  the  piecewise 
constant  right  continuous  interpolation  with  interval  b  of  its  values 

with  a  discontinuity  at  t  =  0,  as  usual.  The  possible  transitions  are  as  follows. 
Let  ^(0)  G  Gh-  Then  ^  transits  as  either  (if  the  time  variable  does  not  advance) 

C  =  (|(-r), . . .  ,eM),|(0))  ^  (C(-r), . . .  ,eM),l(0)  ±  h) 

or  (if  the  time  variable  advances) 

(K-r), . . .  ,e(-^),e(0))  -  {i{-r  +  b), . . .  ,|(-<5),|(0),?(0)). 

For  the  reflecting  points,  we  have  the  immediate  transition 

(|(-r) , . . . ,  |(-^) ,  i?  +  /r)  ^  (|(-r) , . . . ,  K-^) ,  S) . 

Thus  e  Gh  for 


Local  properties  and  the  dynamical  equations.  Define  analo¬ 

gously  to  the  definitions  used  in  Sections  3  and  4.  By  (6.1)  and  (6.2), 


E 


fh,6  _  j.h,b\^n,o  n.,0  ch,6  ^  ^ 

n+1  |m  5^2  Sn  ^ 


ih,6\/-h,6  h,S 


h,6 


=  At: 


h,6 


Define  the  martingale  difference  /3g’^  =  With  the  defini¬ 

tions  (6.1),  (6.2),  and  I  =  ^o’^,C(0)  G  G^,  we  have  if|’^’“A^g 


and  the  conditional  covariance  of  the  martingale  difference  term  /3g’^  is 
Analogously  to  the  expression  below  (3.6),  define  the  stopping  time 


=  max  In  :  ^  <  t  j  . 

As  in  Theorem  4.1,  approximate  the  initial  condition  x(0)  by  ^g’^  (in  the  sense 
of  uniform  convergence  as  /i  ^  0,  ^  ^  0),  and  let  it  be  constant  on  the  intervals 
[— T  -I-  kb,  —T  +  kb  +  b),k  =  Q, . . .  ,Q  —  1,  with  the  values  at  —kb,  k  =  Q, . . .  ,Q, 
being  in  Gh-  Let  and  denote  the  continuous-time  interpolations 

of  the  and  with  the  intervals  {AtJ)’^}.  With  ^g’'^  =  ^g’^(O),  we  can 

write 


?"’'(f)=eo’'  + 


i—Q  2=0 


E 

2=0 


(6.4) 
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ih,6  _  ih,S 
‘^n+1  —  9n 


At: 


h.6 


oh,S 

/-'0,n' 


(6.5) 


Next  let  us  get  the  analog  of  the  process  (4.4).  Define  the  random  vari¬ 
ables  ,  n  =  0, 1 . . . ,  i.i.d.,  exponentially  distributed,  and  independent  of  the 
Then  set  At^  =  .  Define  d^’^{s)  = 

maxjn  :  <  s}.  Define  =  ^^h,6,  and  q^'^{s)  =  t^h,6,  ,■  With  these 

dr’  (s)  d^'  (s) 

definitions,  {s)) .  Let  ^^’^(•)  denote  the  interpolation  with  the 

“t’  ('5) 

random  intervals  Ar))’'^.  Then,  analogously  to  (4.4), 


+ 


0  Ju> 


(6.6) 


where  the  quadratic  variation  of  the  martingale  is 

f  a\l’^^\qy{s)))a\l'^^\qy{s)))ds. 

Jo 

It  follows  from  the  proof  of  Theorem  3.1  that  the  time  scales  used  in  the 
^^’^(•)  and  the  processes  coincide  asymptotically.  I.e.,  q'^’^{s)  —  s  ^  0, 

r^’^(s)  —  t^’^(s)  ^  0,  q^'^{s)  —  s  ^  0.  The  following  theorem  asserts  this  result 
and  the  fact  that  converges  to  t. 

Theorem  6.1.  Let  (('^’^(•)  denote  the  interpolation  of  the  with  the  intervals 
At^’^ .  Then  converges  weakly  to  the  process  with  value  t  at  time  t.  Also, 


lim  sup  E  sup 

s<t 


^  (Arf’^-Atf’^) 

i=0 


1  2 


=  0. 


The  last  assertion  holds  with  d^’^{-)  replacing  d^’^{-). 

Proof.  By  (6.5), 


i—0  i—0 


The  first  sum  equals  t,  modulo  sup„  At!p^ .  The  variance  of  the  martingale  term 
is  6t,  modulo  ^-|-sup„  App^ ,  and  the  term  converges  weakly  to  the  zero  process. 
The  proof  of  the  second  assertion  of  the  theorem  is  just  that  of  Theorem  3.1. 
The  last  assertion  of  the  theorem  follows  from  the  second.  ■ 

It  follows  from  Theorem  6.1  that 


sup 

-r<e<o.t<T 


0. 


(6.7) 
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The  cost  function.  Consider  the  cost  function 


?.0O  ^ 

i=0 

By  using  (6.1)  and  (6.2)  and  a  conditional  expectation,  the  term  6I^^h,s 
can  be  replaced  by  At^’^ .  We  next  show  that  (6.8)  is  well  defined. 


{0! 


c'6y. 


h,6 


(6.8) 


Theorem  6.2.  For  small  h,6,  (6.8)  is  asymptotically  equal  to  {uniformly  in 
such  h,  6) 


E 


i,4>o 


[Me 


'h,6  ^,h,6\ 


)At’^  +  c'6yy] 


(6.9) 


i=0 


Proof.  To  show  that  the  term  involving  k{-)  in  (6.8)  is  well  defined  first  note 
that  it  can  be  bounded  by  a  constant  times  the  expectation  of 
By  the  computations  in  Theorem  6.1,  for  each  K  >  0  there  are  >  0,  that  do 
not  depend  on  the  controls,  such  that  for  small  enough  h,  6 

P  {(j)^'^{T  +K)-  >  ei  |data  to  T]  >  ea 


w.p.l  for  each  T.  Hence,  there  is  £3  >  0,  not  depending  on  the  controls,  such 
that  for  small  enough  h,  6 


E 


e-^(^"’'(^+^)-^'‘''(^))|data  to  t]  < 


w.p.l  for  each  T.  This  implies  that  the  “tail”  of  the  sum  (6.8)  can  be  neglected 

and  we  need  only  consider  the  sum  where  N^’^{t)  =  min{n  :  t^’^  >  t} 

for  arbitrary  t.  But,  by  Theorem  6.1,  for  such  a  sum  the  asymptotic  values  are 
the  same  if  ,i  <  N^’^{t),  is  replaced  by  t’f'^,i  <  N’^'^{t).  Hence  the  terms 
involving  fc(-)  in  (6.8)  and  (6.9)  are  asymptotically  equal.  The  above  estimates 
and  the  inequality  (4.3)  yield  the  same  result  for  the  terms  involving  hy^'^.  ■ 


The  Bellman  equation.  With  the  form  (6.9),  the  effective  canonical  cost  rate 
is  just  k{f,  a)  times  6  times  the  probability  that  the  time  variable  advances, 
namely  k{f,  a)At^’^ {^,  a) .  This  can  be  seen  from  (6.9),  or  from  (6.8)  with  the 
replacement  noted  below  it. 

The  Bellman  equation  can  be  based  on  either  (6.8)  or  (6.9).  They  will  yield 
different  results,  but  will  be  asymptotically  equal  by  Theorem  6.2.  For  (6.8) 
and  ^(0)  =  fg  €  Gfi,  the  Bellman  equation  is  (the  time  variable  </>  does  not 
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appear  in  the  state  since  the  dynamical  terms  are  time  independent) 


+k{i,a)At^'^{i,a)  . 

(6.10) 

If  ^(0)  is  a  reflecting  point  —h  or  B  +  h,  then 

V^’\i)  =  . .  .,a-6),0^  +  cih  for  e(0)  =  -h, 

(^I(-t),  . . . ,  a-6),B'^  +  C2h  for  C(0)  =  B  +  h. 

These  equations  make  it  clear  that  the  full  state  at  iterate  n  is  ^n’^,  namely, 
the  current  value  of  the  spatial  variable  together  with  its  value  at  the  last 
Q  =  T  jb  times  that  the  time  variable  advances. 

Since  we  can  use  (6.9)  for  the  cost  function  when  proving  convergence,  the 
proof  of  the  next  theorem  is  nearly  identical  to  that  of  Theorem  4.1  which  is 
given  in  Section  8. 

Theorem  6.3.  Let  he  loeally  eonsistent  with  the  model  (1.1),  whose 

initial  condition  is  x  =  x(0),  a  continuous  function.  Let  approximate  x 
as  in  Theorem  4.1.  Assume  (Al.l),  (A1.3),  and  the  analog  of  (A4.1)  for  the 
implicit  procedure.  With  either  (6.10)  or  the  Bellman  equation  for  (6.9)  used, 
^  V{X). 

7  The  Number  of  Points  for  the  Implicit  Method- 
State  Delay  Only 

Comment  on  the  value  of  6.  Consider  solving  a  parabolic  PDE  on  a  finite 
time  interval,  and  with  the  classical  estimates  of  rate  of  convergence  holding. 
Typically  b  =  0{h)  and  the  rate  of  convergence  is  0{h?)  +  0{b^)  for  the  implicit 
procedure,  vs.  0{h?)  +  0(max  time  increment)  for  the  explicit  procedure  [14, 
Chapter  6].  But,  for  the  explicit  procedure  the  value  of  the  time  increment  is 
0{hf).  Thus,  for  b  =  0{h),  the  rates  of  convergence  would  be  of  the  same  order. 
There  is  no  proof  that  such  estimates  hold  for  the  control  problem  of  concern 
here.  But  numerical  data  for  the  no-delay  problems  suggests  that  one  should 
use  b  =  0{h). 
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The  implicit  procedure  actually  updates  the  path  values  using  the  time  in¬ 
crement  (see  (6.1)  and  (6.2)),  which  is  close  to  when  6  =  0{h)  and 

At((  =  0{h?).  After  some  random  number  of  updates  of  the  path  component, 
the  time  component  advances.  Thus  the  interpolation  defined  by  (6.3)  is  es¬ 
sentially  a  sampling  of  the  process  at  random  intervals.  Since  the  average 
sum  of  the  between  advances  of  time  is  approximately  6,  the  sampling  is 

approximately  at  time  intervals  5.  This  would  give  a  more  accurate  construc¬ 
tion  than  would  a  process  constructed  with  the  much  larger  discretization 
interval  b.  Additionally,  the  implicit  procedure  allows  us  to  use  the  original  time 
intervals  «  AtJ),  and  not  the  minimal  value  A  .  This  is  computationally 
advantageous  when  the  values  a)  vary  a  great  deal,  for  example  when  the 

upper  bound  on  the  control  is  large.  It  will  next  be  argued  that  when  6  =  0(h), 
the  implicit  procedure  can  be  approximated  such  that  it  has  a  much  smaller 
memory  requirement  than  the  explicit  procedure. 

Reduced  memory  requirements.  The  vector  ^  can  be  represented  in  terms 
of  the  vector 

d  =  {ii-T)  -  ii-T +6),...,  a-6)  -  m,m)  =  (d(g), . . . ,  dm. 

If  ^(0)  is  a  reflection  point,  then  it  moves  immediately  to  the  closest  point  in  Gh- 
Otherwise,  with  this  representation,  the  transitions  are  to  (if  the  time  variable 
does  not  advance) 


{d{Q),...,d{2),d{l)Th,d{Q)±h) 
or  to  (if  the  time  variable  advances) 

(d(Q-l),...,d(2),0,d(0)). 

The  variable  (i(0)  takes  B/h  +  3  possible  values.  Since  there  are  a  potentially 
unbounded  number  of  steps  before  the  time  variable  increases,  the  differences 
d{i),i  >  2,  can  take  values  in  the  set  Gh  —  Gh,  which  is  the  set  of  points 
{B,  B  —  h, ,  —B}.  Hence  there  are  2B/h  +  1  possible  values.  The  fi(l)  can 
take  values  in  Gh  —  G^  =  {B  +  h,B,B  —  h, . . . ,  —B  —  h},  since  ^(0)  takes  values 
in  G'^.  But  over  the  number  of  steps  that  are  required  for  the  time  variable  to 
advance,  with  a  high  probability  the  sample  number  of  values  taken  by  the  d{i) 
will  be  much  less  due  to  cancellations  of  positive  and  negative  steps.  This  idea 
can  be  exploited  by  truncating  the  possible  values  of  the  d{i),i  >  1,  by  some 
Ni  <  2B/h  +  1  such  that  the  probability  that  d{i)  takes  more  than  Ni  values  is 
smaller  than  some  predetermined  number.®  The  maximum  required  number  of 
points  is  (B /h  +  3)(2B /h  +  3){2B .  Comparing  this  with  the  number 

{B/h  +  3)2^/^\  or  {B/h  + 3)3^ 

®More  generally,  one  can  approximate  the  range  of  the  d{i),i  >  1,  by  allowing  them  to 
take  some  prespecified  values  that  might  not  be  integral  multiples  of  h. 
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for  the  explicit  procedure,  we  see  that  there  might  be  considerable  saving  even 
without  truncation,  since  typically  A  =  0{h^)  and  6  =  0{h).  This  saving 
is  due  to  the  fact  that,  for  the  implicit  procedure  the  memory  consists  of  the 
differences  in  the  values  attained  over  many  steps,  and  not  set  of  differences  in 
the  values  for  each  of  those  steps. 

We  next  present  some  computations  concerning  the  value  of  Ni. 


An  example.  For  illustrative  purposes,  first  consider  the  simplest  example. 
Let  (t(-)  be  constant  and  6(-)  zero.  There  is  no  delay,  but  it  will  be  seen  that 
the  estimates  are  typical  even  with  a  drift  and  delay.  Then  is  a  simple 
symmetric  random  walk  with  reflection  and  At^  =  h?  .  Then  use  b  =  h/a, 
which  is  very  close  to  the  probability  that  the  time  variable  advances  at  each 
step  if  h  is  small.  Let  v  be  the  number  of  steps  required  for  the  time  variable 
to  advance,  and  define  =  X]r=o  Then  Ev"^  «  a'^/h^  and 


P  <  sup 

n<v 


M, 


h,S\ 


n  — 


>  < 


E\M, 


h,S\4 


h*Ev^ 


(7.1) 


Let  Ni  =  2N  +  1,  e  =  hN.  Then  the  probability  above  is  bounded  by  /N'^h?. 
Thus  we  need  N  to  increase  slightly  faster  than  1  / ^/h  to  have  an  asymptotically 
negligible  error.  The  number  of  memory  points  needs  is  approximately  {B /h  + 
3)[2//(/i)  +  l]'^/'^  ={B / h+‘i)\^ / f  {h)  +  1Y'^ ,  where  f{h)/Vh  0,  as  compared 
to  the  much  larger  number  {B/h  +  3)2*^  for  the  explicit  procedure.  The 
best  way  of  getting  N  or  of  efficiently  approximating  the  range  of  the  d{i),i  >  1, 
is  not  clear  and  much  further  work  and  numerical  experience  is  required.  But 
the  idea  is  very  appealing. 

Now,  extend  the  above  case  by  letting  6(-)  be  non-zero,  with  delayed  argu¬ 
ments  and  satisfying  \b{^,  a)\h  <  (t^/2.  We  still  have  At^  =  h?  .  The  number 
of  points  needed  for  the  explicit  method  was  noted  above.  Write 
and  bo  =  sup  \b{^,  q;)|.  Now  estimate  sup„<^  by  splitting 

the  terms.  For  the  drift  term,  we  have  the  estimate 


n 

P{sup  bo'^At'l’^ 

n<.v 


>  a/i/2}  <  4 


b^lh^  /  a'^]Ev'^ 

A2/i2 


4&§ 

A2cr2' 


The  estimate  (7.1)  continues  to  hold  for  the  martingale  term.  Thus  the  martin¬ 
gale  term  dominates  and  conclusions  of  the  simpler  case  continue  to  hold. 


8  Comments  on  the  Proof  of  Theorems  4.1  and 
4.2 


Proof.  For  notational  simplicity,  let  us  start  with  the  case  where  the  control  is 
not  delayed.  The  proof  is  close  to  that  for  the  no-delay  case  and  the  structure 
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will  be  outlined.  Let  =  {u^,  n  <  00}  be  the  optimal  control  sequence  for  the 
chain. 

The  main  new  issue  over  the  no-delay  case  is  that  the  process  appears 
in  the  dynamical  terms  Recall  the  definitions,  for  s  >  0,  g^(s)  = 

max{n  '■  <  s}  and  q!^{s)  =  Recall  (4.4): 

+  r  /  h\l\q’^{s)),a)m’^y{dads)  +  M\t)  +  z'^y{t),  (8.1) 

Jo  Ju>' 

where  '!/'^(')  is  the  continuous-time  interpolation  with  intervals  M^(-)  is 

a  martingale  with  quadratic  variation  process 

f  ds, 

JO 

and  can  be  written  as  [10,  Section  10.4.1] 

M\t)=  f  a\l\q’^{s)))dw\s),  (8.2) 

Jo 

where  w^{-)  is  a  martingale  with  quadratic  variation  process  It.  The  discon¬ 
tinuities  of  w^{t)  go  to  zero  as  h  ^  0,  and  it  converges  to  a  standard  Wiener 
process.  The  proofs  of  these  assertions  are  the  same  as  for  the  no-delay  case  in 
[10,  Section  10.4].  Keep  in  mind  that  in  all  cases,  ^^(O  and  are  constructed 
from  the  basic  via  the  appropriate  interpolation.  Theorem  3.1  applies  and 
shows  that  the  time  scales  for  ^^(•)  and  V'^(')  are  asymptotically  equal. 

The  sequence  {?/'''(•),?''(•)>  "t.(;,(-),  w'*(-),  J/^(-),  9r(-)}.  where  z^(-)  =  Vyyi-)- 
t/2  ^(•),  is  tight  and  all  weak  sense  limits  are  continuous.  Take  a  weakly  conver¬ 
gent  subsequence,  also  indexed  by  h  and  with  limit  denoted  by  (x(-),  ^(•),  m(-), 
w{-),y{-),q{-)),  with  z(-)  =  yi{-)  —  y^i')-  The  asymptotic  continuity  of  g(?(-)  is 
implied  by  Theorem  6.1,  and  that  of  yiy{-)  is  similar  to  the  proof  of  a  similar 
result  in  Theorem  2.1.  See  also  the  reference  (Chapter  10).  Assume  the  Skoro- 
hod  representation  so  that  the  limits  can  be  assumed  to  be  w.p.l.  By  Theorem 
6.1,  x{-)  =  ^(•)  and  q{t)  =  t.  Thus  the  process  ^^(•)  converges  to  a;(-).  Now,  by 
the  continuity  conditions  on  b{x,a)  and  a{x)  in  (Al.l), 


b^{^^{q’^{s)),a)m’^{da  ds) 


x{s),a)m{da  ds), 


(8.3) 


a\^\q^{s)))dw'^{s) 


a{x{s))dw{s). 


(8.4) 


Equation  (8.3)  follows  directly  due  to  the  weak  convergence  and  the  con¬ 
tinuity  properties  of  b{-).  To  prove  (8.4),  due  to  the  “stochastic  integral,”  we 
need  to  discretize  as  in  the  proof  of  Theorem  2.1.  For  any  function  of  a  real 
variable  g{-)  and  k  >  0,  let  gjs)  =  g{nK),nK  <  s  <  riK  +  n.  By  the  martingale 


31 


and  quadratic  variation  properties  of  the  mean  square  value  of 

f  a\i\q^^{s)))dw\s)  -  f 

JQ  ^  JO 

Jo 

is  just  the  mean  value  of  the  square  of  the  term  in  brackets  in  the  right  hand 
integrand,  and  it  goes  to  zero  as  k  ^  0,  uniformly  in  h.  The  integral  in  the 
right  side  of  the  first  line  can  be  written  as  a  sum  and  the  weak  convergence 
implies  that  its  limit  as  ft.  ^  0  is  a(xK(s))dw(s).  The  nonanticipativity  of 
with  respect  to  the  Wiener  process  w{-)  is  proved  by  using  the 
analog  of  (2.2),  namely. 

Eh  (s*),i  <  I,j  <  J)  x 

(w'^it  +  T)  -w'^it))  =  0, 

and  proceeding  as  below  (2.2).  Now,  with  the  nonanticipativity  proved,  let 
K  ^  0  in  f*  a(xK(s))dw(s)  to  get  (8.4).  Thus  we  have  proved  that  the  limit 
satisfies  (1.2)  for  some  relaxed  control  m(-). 

Since  m^{-)  is  the  relaxed  control  representation  of  the  interpolation  of  the 
optimal  control  sequence  with  intervals  At^,  by  the  definitions  of  IT^(-)  and 
V^{-)  we  have  V^{^q)  =  W^{^q  ,u^).  By  the  weak  convergence  and  the  conti¬ 
nuity  properties  of  ft(-)  in  (Al.l),  W^{^q,u^)  W{x,m).  By  the  minimality 

of  V{x),  we  must  have  liminf/j  V^{^q)  >  V{x).  We  need  only  prove  that 

limsupF'‘(Cj)  <  V"(s).  (8.5) 

h 

The  proof  of  (8.5)  for  the  no-delay  case  in  [10,  Chapter  10]  can  be  readily 
adapted  to  the  delay  case.  The  proof  depends  on  getting  a  piecewise  constant 
approximation  to  the  optimal  control  for  (1.2)  in  terms  of  past  samples  of  the 
driving  Wiener  process  and  control.  The  details  of  this  approximation  are  a 
little  more  complicated  for  the  present  case,  but  the  method  in  [10,  Theorem 
3.1,  Chapter  10]  carries  over  with  some  notational  changes.  The  construction 
depended  only  on  the  continuity  of  the  dynamical  terms  and  weak-sense  unique¬ 
ness  as  (A1.3).  One  needs  to  add  the  full  initial  condition  of  interest,  including 
the  initial  control  segment,  where  applicable.  With  this  in  hand.  The  converse 
inequality  (8.5)  is  obtained  as  in  the  book.  Owing  to  lack  of  space,  the  details 
will  not  be  given  here. 

The  proof  of  Theorem  6.3  is  the  same.  Just  use  the  interpolations  C^’^(-) 
and  '0^’'^(-)  defined  by  (6.4)  and  (6.6),  resp. 

Delay  in  the  control.  Now  consider  Theorem  4.2,  where  the  control  is  also 
delayed.  The  bracketed  term  in  (4.9)  converges  to 

nb(x(s),a,  v)m{da,  ds  +  v) 

( 
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for  all  V  G  [— r,  0].  The  rest  of  the  details  of  the  convergence  proof  are  as  for  the 
case  where  only  the  state  is  delayed.  ■ 
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